Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 6.643
Filtrar
1.
JMIR Public Health Surveill ; 10: e47064, 2024 May 10.
Artículo en Inglés | MEDLINE | ID: mdl-38728069

RESUMEN

BACKGROUND: Smell disorders are commonly reported with COVID-19 infection. The smell-related issues associated with COVID-19 may be prolonged, even after the respiratory symptoms are resolved. These smell dysfunctions can range from anosmia (complete loss of smell) or hyposmia (reduced sense of smell) to parosmia (smells perceived differently) or phantosmia (smells perceived without an odor source being present). Similar to the difficulty that people experience when talking about their smell experiences, patients find it difficult to express or label the symptoms they experience, thereby complicating diagnosis. The complexity of these symptoms can be an additional burden for patients and health care providers and thus needs further investigation. OBJECTIVE: This study aims to explore the smell disorder concerns of patients and to provide an overview for each specific smell disorder by using the longitudinal survey conducted in 2020 by the Global Consortium for Chemosensory Research, an international research group that has been created ad hoc for studying chemosensory dysfunctions. We aimed to extend the existing knowledge on smell disorders related to COVID-19 by analyzing a large data set of self-reported descriptive comments by using methods from natural language processing. METHODS: We included self-reported data on the description of changes in smell provided by 1560 participants at 2 timepoints (second survey completed between 23 and 291 days). Text data from participants who still had smell disorders at the second timepoint (long-haulers) were compared with the text data of those who did not (non-long-haulers). Specifically, 3 aims were pursued in this study. The first aim was to classify smell disorders based on the participants' self-reports. The second aim was to classify the sentiment of each self-report by using a machine learning approach, and the third aim was to find particular food and nonfood keywords that were more salient among long-haulers than those among non-long-haulers. RESULTS: We found that parosmia (odds ratio [OR] 1.78, 95% CI 1.35-2.37; P<.001) as well as hyposmia (OR 1.74, 95% CI 1.34-2.26; P<.001) were more frequently reported in long-haulers than in non-long-haulers. Furthermore, a significant relationship was found between long-hauler status and sentiment of self-report (P<.001). Finally, we found specific keywords that were more typical for long-haulers than those for non-long-haulers, for example, fire, gas, wine, and vinegar. CONCLUSIONS: Our work shows consistent findings with those of previous studies, which indicate that self-reports, which can easily be extracted online, may offer valuable information to health care and understanding of smell disorders. At the same time, our study on self-reports provides new insights for future studies investigating smell disorders.


Asunto(s)
COVID-19 , Procesamiento de Lenguaje Natural , Trastornos del Olfato , Autoinforme , Humanos , COVID-19/complicaciones , COVID-19/epidemiología , Trastornos del Olfato/epidemiología , Trastornos del Olfato/etiología , Estudios Transversales , Masculino , Femenino , Estudios Longitudinales , Persona de Mediana Edad , Adulto , Anciano , Adulto Joven
2.
PLoS One ; 19(5): e0303519, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38723044

RESUMEN

OBJECTIVE: To establish whether or not a natural language processing technique could identify two common inpatient neurosurgical comorbidities using only text reports of inpatient head imaging. MATERIALS AND METHODS: A training and testing dataset of reports of 979 CT or MRI scans of the brain for patients admitted to the neurosurgery service of a single hospital in June 2021 or to the Emergency Department between July 1-8, 2021, was identified. A variety of machine learning and deep learning algorithms utilizing natural language processing were trained on the training set (84% of the total cohort) and tested on the remaining images. A subset comparison cohort (n = 76) was then assessed to compare output of the best algorithm against real-life inpatient documentation. RESULTS: For "brain compression", a random forest classifier outperformed other candidate algorithms with an accuracy of 0.81 and area under the curve of 0.90 in the testing dataset. For "brain edema", a random forest classifier again outperformed other candidate algorithms with an accuracy of 0.92 and AUC of 0.94 in the testing dataset. In the provider comparison dataset, for "brain compression," the random forest algorithm demonstrated better accuracy (0.76 vs 0.70) and sensitivity (0.73 vs 0.43) than provider documentation. For "brain edema," the algorithm again demonstrated better accuracy (0.92 vs 0.84) and AUC (0.45 vs 0.09) than provider documentation. DISCUSSION: A natural language processing-based machine learning algorithm can reliably and reproducibly identify selected common neurosurgical comorbidities from radiology reports. CONCLUSION: This result may justify the use of machine learning-based decision support to augment provider documentation.


Asunto(s)
Comorbilidad , Procesamiento de Lenguaje Natural , Humanos , Algoritmos , Pacientes Internos/estadística & datos numéricos , Femenino , Masculino , Aprendizaje Automático , Imagen por Resonancia Magnética/métodos , Documentación , Persona de Mediana Edad , Tomografía Computarizada por Rayos X , Procedimientos Neuroquirúrgicos , Anciano , Aprendizaje Profundo
3.
J Orthop Surg Res ; 19(1): 287, 2024 May 10.
Artículo en Inglés | MEDLINE | ID: mdl-38725085

RESUMEN

BACKGROUND: The Center for Medicare and Medicaid Services (CMS) imposes payment penalties for readmissions following total joint replacement surgeries. This study focuses on total hip, knee, and shoulder arthroplasty procedures as they account for most joint replacement surgeries. Apart from being a burden to healthcare systems, readmissions are also troublesome for patients. There are several studies which only utilized structured data from Electronic Health Records (EHR) without considering any gender and payor bias adjustments. METHODS: For this study, dataset of 38,581 total knee, hip, and shoulder replacement surgeries performed from 2015 to 2021 at Novant Health was gathered. This data was used to train a random forest machine learning model to predict the combined endpoint of emergency department (ED) visit or unplanned readmissions within 30 days of discharge or discharge to Skilled Nursing Facility (SNF) following the surgery. 98 features of laboratory results, diagnoses, vitals, medications, and utilization history were extracted. A natural language processing (NLP) model finetuned from Clinical BERT was used to generate an NLP risk score feature for each patient based on their clinical notes. To address societal biases, a feature bias analysis was performed in conjunction with propensity score matching. A threshold optimization algorithm from the Fairlearn toolkit was used to mitigate gender and payor biases to promote fairness in predictions. RESULTS: The model achieved an Area Under the Receiver Operating characteristic Curve (AUROC) of 0.738 (95% confidence interval, 0.724 to 0.754) and an Area Under the Precision-Recall Curve (AUPRC) of 0.406 (95% confidence interval, 0.384 to 0.433). Considering an outcome prevalence of 16%, these metrics indicate the model's ability to accurately discriminate between readmission and non-readmission cases within the context of total arthroplasty surgeries while adjusting patient scores in the model to mitigate bias based on patient gender and payor. CONCLUSION: This work culminated in a model that identifies the most predictive and protective features associated with the combined endpoint. This model serves as a tool to empower healthcare providers to proactively intervene based on these influential factors without introducing bias towards protected patient classes, effectively mitigating the risk of negative outcomes and ultimately improving quality of care regardless of socioeconomic factors.


Asunto(s)
Análisis Costo-Beneficio , Aprendizaje Automático , Readmisión del Paciente , Humanos , Readmisión del Paciente/economía , Readmisión del Paciente/estadística & datos numéricos , Femenino , Masculino , Anciano , Procesamiento de Lenguaje Natural , Persona de Mediana Edad , Artroplastia de Reemplazo de Rodilla/economía , Artroplastia de Reemplazo de Cadera/economía , Artroplastia de Reemplazo/economía , Artroplastia de Reemplazo/efectos adversos , Medición de Riesgo/métodos , Periodo Preoperatorio , Anciano de 80 o más Años , Mejoramiento de la Calidad , Bosques Aleatorios
4.
JCO Clin Cancer Inform ; 8: e2400051, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38713889

RESUMEN

This new editorial discusses the promise and challenges of successful integration of natural language processing methods into electronic health records for timely, robust, and fair oncology pharmacovigilance.


Asunto(s)
Inteligencia Artificial , Registros Electrónicos de Salud , Oncología Médica , Procesamiento de Lenguaje Natural , Farmacovigilancia , Humanos , Oncología Médica/métodos , Recolección de Datos/métodos , Neoplasias/tratamiento farmacológico , Sistemas de Registro de Reacción Adversa a Medicamentos
5.
JMIR Ment Health ; 11: e53730, 2024 May 02.
Artículo en Inglés | MEDLINE | ID: mdl-38722220

RESUMEN

Background: There is growing concern around the use of sodium nitrite (SN) as an emerging means of suicide, particularly among younger people. Given the limited information on the topic from traditional public health surveillance sources, we studied posts made to an online suicide discussion forum, "Sanctioned Suicide," which is a primary source of information on the use and procurement of SN. Objective: This study aims to determine the trends in SN purchase and use, as obtained via data mining from subscriber posts on the forum. We also aim to determine the substances and topics commonly co-occurring with SN, as well as the geographical distribution of users and sources of SN. Methods: We collected all publicly available from the site's inception in March 2018 to October 2022. Using data-driven methods, including natural language processing and machine learning, we analyzed the trends in SN mentions over time, including the locations of SN consumers and the sources from which SN is procured. We developed a transformer-based source and location classifier to determine the geographical distribution of the sources of SN. Results: Posts pertaining to SN show a rise in popularity, and there were statistically significant correlations between real-life use of SN and suicidal intent when compared to data from the Centers for Disease Control and Prevention (CDC) Wide-Ranging Online Data for Epidemiologic Research (⍴=0.727; P<.001) and the National Poison Data System (⍴=0.866; P=.001). We observed frequent co-mentions of antiemetics, benzodiazepines, and acid regulators with SN. Our proposed machine learning-based source and location classifier can detect potential sources of SN with an accuracy of 72.92% and showed consumption in the United States and elsewhere. Conclusions: Vital information about SN and other emerging mechanisms of suicide can be obtained from online forums.


Asunto(s)
Procesamiento de Lenguaje Natural , Conducta Autodestructiva , Nitrito de Sodio , Humanos , Conducta Autodestructiva/epidemiología , Suicidio/tendencias , Suicidio/psicología , Adulto , Internet , Masculino , Femenino , Medios de Comunicación Sociales , Adulto Joven
6.
PLoS One ; 19(5): e0302502, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38743773

RESUMEN

ChatGPT has demonstrated impressive abilities and impacted various aspects of human society since its creation, gaining widespread attention from different social spheres. This study aims to comprehensively assess public perception of ChatGPT on Reddit. The dataset was collected via Reddit, a social media platform, and includes 23,733 posts and comments related to ChatGPT. Firstly, to examine public attitudes, this study conducts content analysis utilizing topic modeling with the Latent Dirichlet Allocation (LDA) algorithm to extract pertinent topics. Furthermore, sentiment analysis categorizes user posts and comments as positive, negative, or neutral using Textblob and Vader in natural language processing. The result of topic modeling shows that seven topics regarding ChatGPT are identified, which can be grouped into three themes: user perception, technical methods, and impacts on society. Results from the sentiment analysis show that 61.6% of the posts and comments hold favorable opinions on ChatGPT. They emphasize ChatGPT's ability to prompt and engage in natural conversations with users, without relying on complex natural language processing. It provides suggestions for ChatGPT developers to enhance its usability design and functionality. Meanwhile, stakeholders, including users, should comprehend the advantages and disadvantages of ChatGPT in human society to promote ethical and regulated implementation of the system.


Asunto(s)
Opinión Pública , Medios de Comunicación Sociales , Humanos , Procesamiento de Lenguaje Natural , Aprendizaje Automático no Supervisado , Actitud , Algoritmos
7.
J Med Internet Res ; 26: e52399, 2024 05 13.
Artículo en Inglés | MEDLINE | ID: mdl-38739445

RESUMEN

BACKGROUND: A large language model (LLM) is a machine learning model inferred from text data that captures subtle patterns of language use in context. Modern LLMs are based on neural network architectures that incorporate transformer methods. They allow the model to relate words together through attention to multiple words in a text sequence. LLMs have been shown to be highly effective for a range of tasks in natural language processing (NLP), including classification and information extraction tasks and generative applications. OBJECTIVE: The aim of this adapted Delphi study was to collect researchers' opinions on how LLMs might influence health care and on the strengths, weaknesses, opportunities, and threats of LLM use in health care. METHODS: We invited researchers in the fields of health informatics, nursing informatics, and medical NLP to share their opinions on LLM use in health care. We started the first round with open questions based on our strengths, weaknesses, opportunities, and threats framework. In the second and third round, the participants scored these items. RESULTS: The first, second, and third rounds had 28, 23, and 21 participants, respectively. Almost all participants (26/28, 93% in round 1 and 20/21, 95% in round 3) were affiliated with academic institutions. Agreement was reached on 103 items related to use cases, benefits, risks, reliability, adoption aspects, and the future of LLMs in health care. Participants offered several use cases, including supporting clinical tasks, documentation tasks, and medical research and education, and agreed that LLM-based systems will act as health assistants for patient education. The agreed-upon benefits included increased efficiency in data handling and extraction, improved automation of processes, improved quality of health care services and overall health outcomes, provision of personalized care, accelerated diagnosis and treatment processes, and improved interaction between patients and health care professionals. In total, 5 risks to health care in general were identified: cybersecurity breaches, the potential for patient misinformation, ethical concerns, the likelihood of biased decision-making, and the risk associated with inaccurate communication. Overconfidence in LLM-based systems was recognized as a risk to the medical profession. The 6 agreed-upon privacy risks included the use of unregulated cloud services that compromise data security, exposure of sensitive patient data, breaches of confidentiality, fraudulent use of information, vulnerabilities in data storage and communication, and inappropriate access or use of patient data. CONCLUSIONS: Future research related to LLMs should not only focus on testing their possibilities for NLP-related tasks but also consider the workflows the models could contribute to and the requirements regarding quality, integration, and regulations needed for successful implementation in practice.


Asunto(s)
Técnica Delphi , Procesamiento de Lenguaje Natural , Humanos , Aprendizaje Automático , Atención a la Salud/métodos , Informática Médica/métodos
8.
Front Public Health ; 12: 1392180, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38716250

RESUMEN

Introduction: Social media platforms serve as a valuable resource for users to share health-related information, aiding in the monitoring of adverse events linked to medications and treatments in drug safety surveillance. However, extracting drug-related adverse events accurately and efficiently from social media poses challenges in both natural language processing research and the pharmacovigilance domain. Method: Recognizing the lack of detailed implementation and evaluation of Bidirectional Encoder Representations from Transformers (BERT)-based models for drug adverse event extraction on social media, we developed a BERT-based language model tailored to identifying drug adverse events in this context. Our model utilized publicly available labeled adverse event data from the ADE-Corpus-V2. Constructing the BERT-based model involved optimizing key hyperparameters, such as the number of training epochs, batch size, and learning rate. Through ten hold-out evaluations on ADE-Corpus-V2 data and external social media datasets, our model consistently demonstrated high accuracy in drug adverse event detection. Result: The hold-out evaluations resulted in average F1 scores of 0.8575, 0.9049, and 0.9813 for detecting words of adverse events, words in adverse events, and words not in adverse events, respectively. External validation using human-labeled adverse event tweets data from SMM4H further substantiated the effectiveness of our model, yielding F1 scores 0.8127, 0.8068, and 0.9790 for detecting words of adverse events, words in adverse events, and words not in adverse events, respectively. Discussion: This study not only showcases the effectiveness of BERT-based language models in accurately identifying drug-related adverse events in the dynamic landscape of social media data, but also addresses the need for the implementation of a comprehensive study design and evaluation. By doing so, we contribute to the advancement of pharmacovigilance practices and methodologies in the context of emerging information sources like social media.


Asunto(s)
Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos , Procesamiento de Lenguaje Natural , Farmacovigilancia , Medios de Comunicación Sociales , Humanos , Sistemas de Registro de Reacción Adversa a Medicamentos
9.
Med Ref Serv Q ; 43(2): 196-202, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38722609

RESUMEN

Named entity recognition (NER) is a powerful computer system that utilizes various computing strategies to extract information from raw text input, since the early 1990s. With rapid advancement in AI and computing, NER models have gained significant attention and been serving as foundational tools across numerus professional domains to organize unstructured data for research and practical applications. This is particularly evident in the medical and healthcare fields, where NER models are essential in efficiently extract critical information from complex documents that are challenging for manual review. Despite its successes, NER present limitations in fully comprehending natural language nuances. However, the development of more advanced and user-friendly models promises to improve work experiences of professional users significantly.


Asunto(s)
Almacenamiento y Recuperación de la Información , Procesamiento de Lenguaje Natural , Almacenamiento y Recuperación de la Información/métodos , Humanos , Inteligencia Artificial
10.
J Med Internet Res ; 26: e52499, 2024 May 02.
Artículo en Inglés | MEDLINE | ID: mdl-38696245

RESUMEN

This study explores the potential of using large language models to assist content analysis by conducting a case study to identify adverse events (AEs) in social media posts. The case study compares ChatGPT's performance with human annotators' in detecting AEs associated with delta-8-tetrahydrocannabinol, a cannabis-derived product. Using the identical instructions given to human annotators, ChatGPT closely approximated human results, with a high degree of agreement noted: 94.4% (9436/10,000) for any AE detection (Fleiss κ=0.95) and 99.3% (9931/10,000) for serious AEs (κ=0.96). These findings suggest that ChatGPT has the potential to replicate human annotation accurately and efficiently. The study recognizes possible limitations, including concerns about the generalizability due to ChatGPT's training data, and prompts further research with different models, data sources, and content analysis tasks. The study highlights the promise of large language models for enhancing the efficiency of biomedical research.


Asunto(s)
Medios de Comunicación Sociales , Humanos , Medios de Comunicación Sociales/estadística & datos numéricos , Dronabinol/efectos adversos , Procesamiento de Lenguaje Natural
11.
Sci Rep ; 14(1): 10785, 2024 05 11.
Artículo en Inglés | MEDLINE | ID: mdl-38734712

RESUMEN

Large language models (LLMs), like ChatGPT, Google's Bard, and Anthropic's Claude, showcase remarkable natural language processing capabilities. Evaluating their proficiency in specialized domains such as neurophysiology is crucial in understanding their utility in research, education, and clinical applications. This study aims to assess and compare the effectiveness of Large Language Models (LLMs) in answering neurophysiology questions in both English and Persian (Farsi) covering a range of topics and cognitive levels. Twenty questions covering four topics (general, sensory system, motor system, and integrative) and two cognitive levels (lower-order and higher-order) were posed to the LLMs. Physiologists scored the essay-style answers on a scale of 0-5 points. Statistical analysis compared the scores across different levels such as model, language, topic, and cognitive levels. Performing qualitative analysis identified reasoning gaps. In general, the models demonstrated good performance (mean score = 3.87/5), with no significant difference between language or cognitive levels. The performance was the strongest in the motor system (mean = 4.41) while the weakest was observed in integrative topics (mean = 3.35). Detailed qualitative analysis uncovered deficiencies in reasoning, discerning priorities, and knowledge integrating. This study offers valuable insights into LLMs' capabilities and limitations in the field of neurophysiology. The models demonstrate proficiency in general questions but face challenges in advanced reasoning and knowledge integration. Targeted training could address gaps in knowledge and causal reasoning. As LLMs evolve, rigorous domain-specific assessments will be crucial for evaluating advancements in their performance.


Asunto(s)
Lenguaje , Neurofisiología , Humanos , Neurofisiología/métodos , Procesamiento de Lenguaje Natural , Cognición/fisiología
12.
Health Informatics J ; 30(2): 14604582241240680, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38739488

RESUMEN

Objective: This study examined major themes and sentiments and their trajectories and interactions over time using subcategories of Reddit data. The aim was to facilitate decision-making for psychosocial rehabilitation. Materials and Methods: We utilized natural language processing techniques, including topic modeling and sentiment analysis, on a dataset consisting of more than 38,000 topics, comments, and posts collected from a subreddit dedicated to the experiences of people who tested positive for COVID-19. In this longitudinal exploratory analysis, we studied the dynamics between the most dominant topics and subjects' emotional states over an 18-month period. Results: Our findings highlight the evolution of the textual and sentimental status of major topics discussed by COVID survivors over an extended period of time during the pandemic. We particularly studied pre- and post-vaccination eras as a turning point in the timeline of the pandemic. The results show that not only does the relevance of topics change over time, but the emotions attached to them also vary. Major social events, such as the administration of vaccines or enforcement of nationwide policies, are also reflected through the discussions and inquiries of social media users. In particular, the emotional state (i.e., sentiments and polarity of their feelings) of those who have experienced COVID personally. Discussion: Cumulative societal knowledge regarding the COVID-19 pandemic impacts the patterns with which people discuss their experiences, concerns, and opinions. The subjects' emotional state with respect to different topics was also impacted by extraneous factors and events, such as vaccination. Conclusion: By mining major topics, sentiments, and trajectories demonstrated in COVID-19 survivors' interactions on Reddit, this study contributes to the emerging body of scholarship on COVID-19 survivors' mental health outcomes, providing insights into the design of mental health support and rehabilitation services for COVID-19 survivors.


Asunto(s)
COVID-19 , SARS-CoV-2 , Sobrevivientes , Humanos , COVID-19/psicología , COVID-19/epidemiología , Sobrevivientes/psicología , Minería de Datos/métodos , Pandemias , Procesamiento de Lenguaje Natural , Medios de Comunicación Sociales/tendencias , Estudios Longitudinales
13.
Clin Imaging ; 110: 110164, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38691911

RESUMEN

Natural Language Processing (NLP), a form of Artificial Intelligence, allows free-text based clinical documentation to be integrated in ways that facilitate data analysis, data interpretation and formation of individualized medical and obstetrical care. In this cross-sectional study, we identified all births during the study period carrying the radiology-confirmed diagnosis of fibroid uterus in pregnancy (defined as size of largest diameter of >5 cm) by using an NLP platform and compared it to non-NLP derived data using ICD10 codes of the same diagnosis. We then compared the two sets of data and stratified documentation gaps by race. Using fibroid uterus in pregnancy as a marker, we found that Black patients were more likely to have the diagnosis entered late into the patient's chart or had missing documentation of the diagnosis. With appropriate algorithm definitions, cross referencing and thorough validation steps, NLP can contribute to identifying areas of documentation gaps and improve quality of care.


Asunto(s)
Documentación , Procesamiento de Lenguaje Natural , Neoplasias Uterinas , Humanos , Femenino , Embarazo , Estudios Transversales , Documentación/normas , Documentación/estadística & datos numéricos , Neoplasias Uterinas/diagnóstico por imagen , Racismo , Leiomioma/diagnóstico por imagen , Adulto , Obstetricia , Complicaciones Neoplásicas del Embarazo/diagnóstico por imagen
14.
Sci Data ; 11(1): 455, 2024 May 04.
Artículo en Inglés | MEDLINE | ID: mdl-38704422

RESUMEN

Due to the complexity of the biomedical domain, the ability to capture semantically meaningful representations of terms in context is a long-standing challenge. Despite important progress in the past years, no evaluation benchmark has been developed to evaluate how well language models represent biomedical concepts according to their corresponding context. Inspired by the Word-in-Context (WiC) benchmark, in which word sense disambiguation is reformulated as a binary classification task, we propose a novel dataset, BioWiC, to evaluate the ability of language models to encode biomedical terms in context. BioWiC comprises 20'156 instances, covering over 7'400 unique biomedical terms, making it the largest WiC dataset in the biomedical domain. We evaluate BioWiC both intrinsically and extrinsically and show that it could be used as a reliable benchmark for evaluating context-dependent embeddings in biomedical corpora. In addition, we conduct several experiments using a variety of discriminative and generative large language models to establish robust baselines that can serve as a foundation for future research.


Asunto(s)
Procesamiento de Lenguaje Natural , Semántica , Lenguaje
15.
Comput Biol Med ; 175: 108528, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38718665

RESUMEN

Global eating habits cause health issues leading people to mindful eating. This has directed attention to applying deep learning to food-related data. The proposed work develops a new framework integrating neural network and natural language processing for classification of food images and automated recipe extraction. It address the challenges of intra-class variability and inter-class similarity in food images that have received shallow attention in the literature. Firstly, a customized lightweight deep convolution neural network model, MResNet-50 for classifying food images is proposed. Secondly, automated ingredient processing and recipe extraction is done using natural language processing algorithms: Word2Vec and Transformers in conjunction. Thirdly, a representational semi-structured domain ontology is built to store the relationship between cuisine, food item, and ingredients. The accuracy of the proposed framework on the Food-101 and UECFOOD256 datasets is increased by 2.4% and 7.5%, respectively, outperforming existing models in literature such as DeepFood, CNN-Food, Wiser, and other pre-trained neural networks.


Asunto(s)
Procesamiento de Imagen Asistido por Computador , Procesamiento de Lenguaje Natural , Redes Neurales de la Computación , Humanos , Procesamiento de Imagen Asistido por Computador/métodos , Alimentos/clasificación , Aprendizaje Profundo , Algoritmos
16.
Sci Data ; 11(1): 482, 2024 May 10.
Artículo en Inglés | MEDLINE | ID: mdl-38730023

RESUMEN

Prolonged and over-excessive interaction with cyberspace poses a threat to people's health and leads to the occurrence of Cyber-Syndrome, which covers not only physiological but also psychological disorders. This paper aims to create a tree-shaped gold-standard corpus that annotates the Cyber-Syndrome, clinical manifestations, and acupoints that can alleviate their symptoms or signs, designating this corpus as CS-A. In the CS-A corpus, this paper defines six entities and relations subject to annotation. There are 448 texts to annotate in total manually. After three rounds of updating the annotation guidelines, the inter-annotator agreement (IAA) improved significantly, resulting in a higher IAA score of 86.05%. The purpose of constructing CS-A corpus is to increase the popularity of Cyber-Syndrome and draw attention to its subtle impact on people's health. Meanwhile, annotated corpus promotes the development of natural language processing technology. Some model experiments can be implemented based on this corpus, such as optimizing and improving models for discontinuous entity recognition, nested entity recognition, etc. The CS-A corpus has been uploaded to figshare.


Asunto(s)
Puntos de Acupuntura , Humanos , Procesamiento de Lenguaje Natural , Computadores , Internet
17.
Bioinformatics ; 40(5)2024 May 02.
Artículo en Inglés | MEDLINE | ID: mdl-38662583

RESUMEN

MOTIVATION: The rapid expansion of Bioinformatics research has led to a proliferation of computational tools for scientific analysis pipelines. However, constructing these pipelines is a demanding task, requiring extensive domain knowledge and careful consideration. As the Bioinformatics landscape evolves, researchers, both novice and expert, may feel overwhelmed in unfamiliar fields, potentially leading to the selection of unsuitable tools during workflow development. RESULTS: In this article, we introduce the Bioinformatics Tool Recommendation system (BTR), a deep learning model designed to recommend suitable tools for a given workflow-in-progress. BTR leverages recent advances in graph neural network technology, representing the workflow as a graph to capture essential context. Natural language processing techniques enhance tool recommendations by analyzing associated tool descriptions. Experiments demonstrate that BTR outperforms the existing Galaxy tool recommendation system, showcasing its potential to streamline scientific workflow construction. AVAILABILITY AND IMPLEMENTATION: The Python source code is available at https://github.com/ryangreenj/bioinformatics_tool_recommendation.


Asunto(s)
Biología Computacional , Programas Informáticos , Flujo de Trabajo , Biología Computacional/métodos , Aprendizaje Profundo , Procesamiento de Lenguaje Natural
18.
Brief Bioinform ; 25(3)2024 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-38609331

RESUMEN

Natural language processing (NLP) has become an essential technique in various fields, offering a wide range of possibilities for analyzing data and developing diverse NLP tasks. In the biomedical domain, understanding the complex relationships between compounds and proteins is critical, especially in the context of signal transduction and biochemical pathways. Among these relationships, protein-protein interactions (PPIs) are of particular interest, given their potential to trigger a variety of biological reactions. To improve the ability to predict PPI events, we propose the protein event detection dataset (PEDD), which comprises 6823 abstracts, 39 488 sentences and 182 937 gene pairs. Our PEDD dataset has been utilized in the AI CUP Biomedical Paper Analysis competition, where systems are challenged to predict 12 different relation types. In this paper, we review the state-of-the-art relation extraction research and provide an overview of the PEDD's compilation process. Furthermore, we present the results of the PPI extraction competition and evaluate several language models' performances on the PEDD. This paper's outcomes will provide a valuable roadmap for future studies on protein event detection in NLP. By addressing this critical challenge, we hope to enable breakthroughs in drug discovery and enhance our understanding of the molecular mechanisms underlying various diseases.


Asunto(s)
Descubrimiento de Drogas , Procesamiento de Lenguaje Natural , Transducción de Señal
19.
BMC Med Inform Decis Mak ; 24(1): 107, 2024 Apr 23.
Artículo en Inglés | MEDLINE | ID: mdl-38654295

RESUMEN

BACKGROUND: This study aims to propose a semi-automatic method for monitoring the waiting times of follow-up examinations within the National Health System (NHS) in Italy, which is currently not possible to due the absence of the necessary structured information in the official databases. METHODS: A Natural Language Processing (NLP) based pipeline has been developed to extract the waiting time information from the text of referrals for follow-up examinations in the Lombardy Region. A manually annotated dataset of 10 000 referrals has been used to develop the pipeline and another manually annotated dataset of 10 000 referrals has been used to test its performance. Subsequently, the pipeline has been used to analyze all 12 million referrals prescribed in 2021 and performed by May 2022 in the Lombardy Region. RESULTS: The NLP-based pipeline exhibited high precision (0.999) and recall (0.973) in identifying waiting time information from referrals' texts, with high accuracy in normalization (0.948-0.998). The overall reporting of timing indications in referrals' texts for follow-up examinations was low (2%), showing notable variations across medical disciplines and types of prescribing physicians. Among the referrals reporting waiting times, 16% experienced delays (average delay = 19 days, standard deviation = 34 days), with significant differences observed across medical disciplines and geographical areas. CONCLUSIONS: The use of NLP proved to be a valuable tool for assessing waiting times in follow-up examinations, which are particularly critical for the NHS due to the significant impact of chronic diseases, where follow-up exams are pivotal. Health authorities can exploit this tool to monitor the quality of NHS services and optimize resource allocation.


Asunto(s)
Procesamiento de Lenguaje Natural , Derivación y Consulta , Humanos , Italia , Listas de Espera , Factores de Tiempo
20.
Sci Rep ; 14(1): 9035, 2024 04 19.
Artículo en Inglés | MEDLINE | ID: mdl-38641674

RESUMEN

Physicians' letters are the optimal source of diagnoses for registries. However, most registries demand for diagnosis codes such as ICD-10. We herein describe an algorithm that infers ICD-10 codes from German ophthalmologic physicians' letters. We assess the method in three German eye hospitals. Our algorithm is based on the nearest-neighbor method as well as on a large thesaurus for ICD-10 codes. This thesaurus was embedded into a Word2Vec space created from anonymized physicians' reports of the first hospital. For evaluation, each of the three hospitals sent all diagnoses taken from 100 letters. The inferred ICD-10 codes were evaluated for correctness by the senders. A total of 3332 natural language terms had been sent in (812 hospital one, 1473 hospital two, 1047 hospital three). A total of 526 non-diagnoses were excluded upfront. 2806 ICD-10 codes were inferred (771 hospital one, 1226 hospital two, 809 hospital three). In the first hospital, 98% were fully correct and 99% correct at the level of the superordinate disease concept. The percentages in hospital two were 69% and 86%. The respective numbers for hospital three were 69% and 91%. Our simple method is capable of inferring ICD-10 codes for German natural language diagnoses, especially when the embedding space has been built with physicians' letters from the same hospital. The method may yield sufficient accuracy for many tasks in the multi-centric setting and can easily be adapted to other languages/specialities.


Asunto(s)
Clasificación Internacional de Enfermedades , Médicos , Humanos , Procesamiento de Lenguaje Natural , Hospitales , Sistema de Registros
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA